Distributional Word Clustering in Parallel
نویسندگان
چکیده
We discuss various methods which have been applied to grouping words into syntactic and semantic categories, primarily how they deal with the problems of sparsity and computational complexity. We then present a method of distributional clustering, and discuss the parallelization of the most computationally intensive part of this process.
منابع مشابه
An Information Theoretic Approach to Bilingual Word Clustering
We present an information theoretic objective for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages. The monolingual component of our objective is the average mutual information of clusters of adjacent words in each language, while the...
متن کاملResolving Translation Ambiguity Using Non-Parallel Bilingual Corpora
This paper presents an unsupervised method for choosing the correct translation of a word in context. It learns disambiguation information from nonparallel bilinguM corpora (preferably in the same domain) free from tagging. Our method combines two existing unsupervised disambiguation algorithms: a word sense disambiguation algorithm based on distributional clustering and a translation disambigu...
متن کاملAutomatically Discovering Word Senses
We will demonstrate the output of a distributional clustering algorithm called Clustering by Committee that automatically discovers word senses from text1.
متن کاملSemantic Clustering of Russian Web Search Results: Possibilities and Problems
The present paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply the data to cluster the results of Mail.ru search according to meanings in the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics to big linguistic data are d...
متن کاملRussian Named Entities Recognition and Classification Using Distributed Word and Phrase Representations
The paper presents results on Russian named entities classification and equivalent named entities retrieval using word and phrase representations. It is shown that a word or an expression’s context vector is an efficient feature to be used for predicting the type of a named entity. Distributed word representations are now claimed (and on a reasonable basis) to be one of the most promising distr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006